Skip to content

[MachineLICM] Do not rely on hasSideEffect when handling impdefs #145379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

atrosinenko
Copy link
Contributor

Take clobbered registers described as implicit-def operands into
account when deciding if an instruction can be moved.

When hasSideEffect was set to 0 in the definition of LOADgotAUTH,
MultiSource/Benchmarks/Ptrdist/ks/ks test from llvm-test-suite started
to crash. The issue was traced down to MachineLICM pass placing
LOADgotAUTH right after an unrelated copy to x16 like rewriting this
code:

bb.0:
  renamable $x16 = COPY renamable $x12
  B %bb.1

bb.1:
  ...
  /* use $x16 */
  ...
  renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  /* use $x20 */
  ...

like the following:

bb.0:
  renamable $x16 = COPY renamable $x12
  renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  B %bb.1

bb.1:
  ...
  /* use $x16 */
  ...
  /* use $x20 */
  ...

@atrosinenko atrosinenko requested review from asl and pcc June 23, 2025 18:24
@atrosinenko
Copy link
Contributor Author

Relates to #130809, #141330.

@atrosinenko atrosinenko marked this pull request as ready for review June 23, 2025 19:12
@llvmbot
Copy link
Member

llvmbot commented Jun 23, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Anatoly Trosinenko (atrosinenko)

Changes

Take clobbered registers described as implicit-def operands into
account when deciding if an instruction can be moved.

When hasSideEffect was set to 0 in the definition of LOADgotAUTH,
MultiSource/Benchmarks/Ptrdist/ks/ks test from llvm-test-suite started
to crash. The issue was traced down to MachineLICM pass placing
LOADgotAUTH right after an unrelated copy to x16 like rewriting this
code:

bb.0:
  renamable $x16 = COPY renamable $x12
  B %bb.1

bb.1:
  ...
  /* use $x16 */
  ...
  renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @<!-- -->some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  /* use $x20 */
  ...

like the following:

bb.0:
  renamable $x16 = COPY renamable $x12
  renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @<!-- -->some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  B %bb.1

bb.1:
  ...
  /* use $x16 */
  ...
  /* use $x20 */
  ...

Full diff: https://github.com/llvm/llvm-project/pull/145379.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineLICM.cpp (+7-10)
  • (added) llvm/test/CodeGen/AArch64/mlicm-implicit-defs.mir (+53)
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index c9079170ca575..5841a9ffa7a99 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -559,18 +559,15 @@ void MachineLICMImpl::ProcessMI(MachineInstr *MI, BitVector &RUDefs,
       if (!MO.isDead())
         // Non-dead implicit def? This cannot be hoisted.
         RuledOut = true;
-      // No need to check if a dead implicit def is also defined by
-      // another instruction.
-      continue;
+    } else {
+      // FIXME: For now, avoid instructions with multiple defs, unless
+      // it's a dead implicit def.
+      if (Def)
+        RuledOut = true;
+      else
+        Def = Reg;
     }
 
-    // FIXME: For now, avoid instructions with multiple defs, unless
-    // it's a dead implicit def.
-    if (Def)
-      RuledOut = true;
-    else
-      Def = Reg;
-
     // If we have already seen another instruction that defines the same
     // register, then this is not safe.  Two defs is indicated by setting a
     // PhysRegClobbers bit.
diff --git a/llvm/test/CodeGen/AArch64/mlicm-implicit-defs.mir b/llvm/test/CodeGen/AArch64/mlicm-implicit-defs.mir
new file mode 100644
index 0000000000000..51cb35116dc62
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/mlicm-implicit-defs.mir
@@ -0,0 +1,53 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64 -run-pass machinelicm -verify-machineinstrs -o - %s | FileCheck %s
+# RUN: llc -mtriple=aarch64 -passes machinelicm -o - %s | FileCheck %s
+
+---
+name: test
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: test
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $x0
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $x16 = COPY killed $x0
+  ; CHECK-NEXT:   B %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $x16
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $x1 = COPY killed $x16
+  ; CHECK-NEXT:   $x2 = MOVi64imm 1024, implicit-def dead $x16
+  ; CHECK-NEXT:   $x16 = LDRXroX killed $x1, killed $x2, 0, 0
+  ; CHECK-NEXT:   $xzr = SUBSXri $x16, 0, 0, implicit-def $nzcv
+  ; CHECK-NEXT:   Bcc 1, %bb.1, implicit $nzcv
+  ; CHECK-NEXT:   B %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   liveins: $x1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $x0 = COPY killed $x1
+  ; CHECK-NEXT:   RET_ReallyLR
+  bb.0:
+    liveins: $x0
+    $x16 = COPY killed $x0
+    B %bb.1
+
+  bb.1:
+    liveins: $x16
+    $x1 = COPY killed $x16
+    /* MOVi64imm below mimics a pseudo instruction that doesn't have any */
+    /* unmodelled side effects, but uses x16 as a scratch register.      */
+    $x2 = MOVi64imm 1024, implicit-def dead $x16
+    $x16 = LDRXroX killed $x1, killed $x2, 0, 0
+    $xzr = SUBSXri $x16, 0, 0, implicit-def $nzcv
+    Bcc 1, %bb.1, implicit $nzcv
+    B %bb.2
+
+  bb.2:
+    liveins: $x1
+    $x0 = COPY killed $x1
+    RET_ReallyLR
+...

@atrosinenko atrosinenko force-pushed the users/atrosinenko/machine-licm-implicit-defs branch from 05c3da9 to 079d654 Compare June 26, 2025 12:04
@atrosinenko
Copy link
Contributor Author

Fixed the test failures: the original fix ruled out safe-to-move instructions with dead implicit-def operands. Fixed this in 079d654 and updated precommitted test cases accordingly.

Copy link
Contributor Author

atrosinenko commented Jul 1, 2025

Take clobbered registers described as implicit-def operands into
account when deciding if an instruction can be moved.

When hasSideEffect was set to 0 in the definition of LOADgotAUTH,
MultiSource/Benchmarks/Ptrdist/ks/ks test from llvm-test-suite started
to crash. The issue was traced down to MachineLICM pass placing
LOADgotAUTH right after an unrelated copy to x16 like rewriting this
code:

    bb.0:
      renamable $x16 = COPY renamable $x12
      B %bb.1

    bb.1:
      ...
      /* use $x16 */
      ...
      renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
      /* use $x20 */
      ...

like the following:

    bb.0:
      renamable $x16 = COPY renamable $x12
      renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
      B %bb.1

    bb.1:
      ...
      /* use $x16 */
      ...
      /* use $x20 */
      ...
@atrosinenko atrosinenko force-pushed the users/atrosinenko/machine-licm-implicit-defs branch from 079d654 to 707a36c Compare July 1, 2025 13:26
@atrosinenko
Copy link
Contributor Author

Ping.

@@ -585,6 +580,8 @@ void MachineLICMImpl::ProcessMI(MachineInstr *MI, BitVector &RUDefs,
}

RUDefs.set(Unit);
if (MO.isImplicit())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes implicit defs special here (compared to explicit defs)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the special case in 4c29db1, thank you!

After investigating HoistRegionPostRA and ProcessMI functions for a while, it feels like the description of RUDefs and RUClobbers are hardly accurate:

  BitVector RUDefs(NumRegUnits);     // RUs defined once in the loop.
  BitVector RUClobbers(NumRegUnits); // RUs defined more than once.

These two variables are defined in HoistRegionPostRA and used both in HoistRegionPostRA and ProcessMI by setting and testing bits as well as by mass-settint bits in applyBitsNotInRegMaskToRegUnitsMask.

I have an impression that RUClobbers is sometimes set while it would be better to first check RUDefs (and possibly set corresponding bit there instead), such as when handling MO.isRegMask() case a few lines above - it looks harmless, even though slightly pessimistic, to me.

At the same time, RUDefs seems to mix two different properties of a register: "the register is live" vs. "we have observed an instruction that defined the register that was initially dead" - while both should transition to "clobbered" state after defining that register one more time, only the latter implies some sort of non-uniformity. Thus, when HoistRegionPostRA rejects candidate instructions as follows:

    MachineInstr *MI = Candidate.MI;
    for (const MachineOperand &MO : MI->all_uses()) {
      if (!MO.getReg())
        continue;
      for (MCRegUnit Unit : TRI->regunits(MO.getReg())) {
        if (RUDefs.test(Unit) || RUClobbers.test(Unit)) {
          // If it's using a non-loop-invariant register, then it's obviously
          // not safe to hoist.
          Safe = false;
          break;
        }
      }

      if (!Safe)
        break;
    }

it looks like rejecting any instructions having any register inputs given that

    // Conservatively treat live-in's as an external def.
    // FIXME: That means a reload that're reused in successor block(s) will not
    // be LICM'ed.
    for (const auto &LI : BB->liveins()) {
      for (MCRegUnit Unit : TRI->regunits(LI.PhysReg))
        RUDefs.set(Unit);
    }

Quite surprisingly, it is not, because on X86 something like this is possible (here $rip is not a live-in):

renamable $rcx = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @x, $noreg, debug-location !22 :: (load (s64) from got); t.ll:5:7

Nevertheless, assuming that the only reason to treat implicit register operands specially is to ignore dead implicit defs for simplicity, it looks harmless to handle any defined register identically, whether it is implicit or explicit.

@atrosinenko
Copy link
Contributor Author

The test failure looks unrelated:

2025-07-03T17:09:06.5937349Z CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /opt/llvm/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/gha/actions-runner/_work/llvm-project/llvm-project/build/unittests/Target/AArch64 -I/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/unittests/Target/AArch64 -I/home/gha/actions-runner/_work/llvm-project/llvm-project/build/include -I/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/include -I/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/lib/Target/AArch64 -I/home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/Target/AArch64 -I/home/gha/actions-runner/_work/llvm-project/llvm-project/third-party/unittest/googletest/include -I/home/gha/actions-runner/_work/llvm-project/llvm-project/third-party/unittest/googlemock/include -gmlt -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17  -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -Wno-suggest-override -MD -MT unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/AArch64SelectionDAGTest.cpp.o -MF unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/AArch64SelectionDAGTest.cpp.o.d -o unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/AArch64SelectionDAGTest.cpp.o -c /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
2025-07-03T17:09:06.5945157Z /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp:65:10: error: no matching member function for call to 'init'
2025-07-03T17:09:06.5945952Z    65 |     DAG->init(*MF, ORE, nullptr, nullptr, nullptr, nullptr, nullptr, MMI,
2025-07-03T17:09:06.5946363Z       |     ~~~~~^~~~
2025-07-03T17:09:06.5947046Z /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/include/llvm/CodeGen/SelectionDAG.h:472:17: note: candidate function not viable: requires 10 arguments, but 9 were provided
2025-07-03T17:09:06.5947891Z   472 |   LLVM_ABI void init(MachineFunction &NewMF, OptimizationRemarkEmitter &NewORE,
2025-07-03T17:09:06.5948282Z       |                 ^    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5948605Z   473 |                      Pass *PassPtr, const TargetLibraryInfo *LibraryInfo,
2025-07-03T17:09:06.5948915Z       |                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5949270Z   474 |                      UniformityInfo *UA, ProfileSummaryInfo *PSIin,
2025-07-03T17:09:06.5949577Z       |                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5949880Z   475 |                      BlockFrequencyInfo *BFIin, MachineModuleInfo &MMI,
2025-07-03T17:09:06.5950197Z       |                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5950516Z   476 |                      FunctionVarLocs const *FnVarLocs, bool HasDivergency);
2025-07-03T17:09:06.5950852Z       |                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5951606Z /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/include/llvm/CodeGen/SelectionDAG.h:478:8: note: candidate function not viable: requires 10 arguments, but 9 were provided
2025-07-03T17:09:06.5952434Z   478 |   void init(MachineFunction &NewMF, OptimizationRemarkEmitter &NewORE,
2025-07-03T17:09:06.5953062Z       |        ^    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5953374Z   479 |             MachineFunctionAnalysisManager &AM,
2025-07-03T17:09:06.5953640Z       |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5953947Z   480 |             const TargetLibraryInfo *LibraryInfo, UniformityInfo *UA,
2025-07-03T17:09:06.5954271Z       |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5954600Z   481 |             ProfileSummaryInfo *PSIin, BlockFrequencyInfo *BFIin,
2025-07-03T17:09:06.5954908Z       |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5955231Z   482 |             MachineModuleInfo &MMI, FunctionVarLocs const *FnVarLocs,
2025-07-03T17:09:06.5955553Z       |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5955810Z   483 |             bool HasDivergency) {
2025-07-03T17:09:06.5956038Z       |             ~~~~~~~~~~~~~~~~~~
2025-07-03T17:09:06.5956245Z 1 error generated.

By the way, the PRs stacked on top of this one passed the tests - maybe check-llvm-unit tests are not executed for these PRs.

@@ -554,23 +554,18 @@ void MachineLICMImpl::ProcessMI(MachineInstr *MI, BitVector &RUDefs,
}

if (MO.isImplicit()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's unclear to me is why implicit defs are special here as well. I uploaded #147624 which merges the logic for implicit and explicit defs, and it passes the new test from this PR. I also ran a stage2 check-llvm and check-clang and they both pass. It requires updates to some target-specific tests; I'm not an expert on all of the affected ISAs but they seem correct.

@atrosinenko
Copy link
Contributor Author

Superseded by #147624.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants